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Abstract 

The game-theoretic field of Collective INtelligence (COIN) concerns the design of computer- 
based players engaged in a non-cooperative game so that as those players pursue their self- 
interests, a pre-specified global goal for the collective computational system is achieved “as a 
side-effect” . Previous implementations of COIN algorithms have outperformed conventional 
techniques by up to several orders of magnitude, on domains ranging from telecommunica- 
tions control to optimization in congestion problems. Recent mathematical developments 
have revealed that these previously developed game-theory-motivated algorithms were based 
on only two of the three factors determining performance. Consideration of only the third 
factor would instead lead to conventional optimization techniques like simulated annealing 
that have little to do with non-cooperative games. In this paper we present an algorithm 
based on all three terms at once. This algorithm can be viewed as a way to modify simu- 
lated annealing by recasting it as a non-cooperative game, with each variable replaced by a 
player. This recasting allows us to leverage the intelligent behavior of the individual players 
to substantially improve the exploration step of the simulated annealing. Experiments are 
presented demonstrating that this recasting improves simulated annealing by several orders 
of magnitude for spin glass relaxation and bin-packing. 


1 INTRODUCTION 

There are two general types of distributed systems that are found in nature and that researchers 
have translated into computational algorithms for function maximization. The first is exemplified 
by Neo-Darwinian natural selection, which has been translated into genetic algorithms (GA’s) [5]. 
The function G maximized in these distributed systems takes as argument any single one of the 
system’s variables. (Each of those variables is viewed as a “genome”, with G of a genome being 
the “fitness” of the “phenotype” induced by that genome.) 

Whereas systems of this first type have a “narrow G”, in the second type of distributed 
system the function G being optimized is “wide”, taking the state of the entire distributed 
system as its argument. In some such distributed systems it is only in the crudest sense that 
the individual variables can be viewed as players in a non-cooperative game. Examples include 
simulated annealing (SA [13]) and swarm intelligence [1], inspired by spin relaxation in physics 
and eusocial insect colonies, respectively. 

In other distributed systems that have a wide G the value of each of the individual variables 
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going into G is set by a player engaged in an over- arching non-cooperative game where each 
player 77 is trying to maximize its associated payoff utility function g^. Roughly speaking, such 
collective systems work when the utility functions of the individual variables /players are all 
“aligned” with the world utility G . Under these circumstances, as the individual players pursue 
their self-interests, the global goal for the full collective of maximizing G is achieved “as a side- 
effect” . The primary naturally-occurring analogues of such collectives are economic institutions 
where the players are human beings, e.g., auctions and clearing of markets. In the computational 
translations of such systems the players are computer programs [20, 4, 12]. 

The “Collective INtelligence” (COIN) framework concerns the design of such collectives. 
In particular, it addresses the issue of how to generate from a provided G the set of utilities 
{ g that have optimal signal/noise for each player rj while also having the property that as 
the individual players maximize those utilities, G also gets maximized. This work on design 
of collectives can be viewed as an extension of mechanism design [9] beyond human economics, 
to include concern for signal- to-noise ratio in the payoff functions and off-equilibrium behavior, 
and to allow far more freedom in choice of the g n than exists with human players (for example 
to encompass computational systems in which the issue of incentive compatibility is moot), 
and also to encompass arbitrary G and arbitrary dynamics of the system. Applications of this 
framework on problems from routing in telecommunication networks [21, 24] to congestion 
problems [25] have resulted in substantial performance improvement over conventional techniques 
that do not consider issues like signal-to-noise. Typically as the size of the collective grows, such 
improvements reach several orders of magnitude. 

Recent mathematical developments have shown that the previously developed COIN algo- 
rithms for design of collectives were based on only two of the three factors determining perfor- 
mance at maximizing G. Consideration of only the third factor would instead lead to conventional 
wide-G systems like simulated annealing that have little to do with non-cooperative games. Con- 
sideration of all three terms at once therefore would result in an algorithm that combines the 
two types of wide-G function maximization system, with naturally-occurring analogues of human 
economics and statistical physics, respectively. 

In this paper we present such an algorithm. Because of its similarity to (certain aspects of) 
how human corporations are run, we call it the Computational Corporation (CoCo) algorithm. 
Roughly speaking, it works by modifying the exploration step of simulated annealing by having 
the new values of the variables set by the moves of intelligent players in a non-cooperative game. 
Like simulated annealing, the computational corporation algorithm is intended not to give the 
best possible performance in all problem domains — an algorithm laboriously tailored for a 
particular domain will invariably perform best for that domain [23]. Rather like other algorithms 
related to naturally-occurring distribution systems, the computational corporation algorithm is 
intended as a powerful and broadly applicable “off-the-shelf” algorithm. 

We present experiments demonstrating that the computational corporation algorithm out- 
performs simulated annealing by several orders of magnitudes for spin glass relaxation and bin- 
packing. In the spin glass domain CoCo converges to a given value of G over two orders of 
magnitude faster than does SA, with far better scaling behavior (the ratio of their convergence 
speeds increased exponentially with the size of the problem). In the bin packing problem, both 
CoCo and conventional COIN algorithms significantly outperform SA (up to three orders of mag- 
nitude faster convergence). CoCo achieves the optimum solution in a higher percentage of the 
runs (82 % vs. 56 %) than does the COIN algorithm, and provides better “worse case” properties 
(i.e., the worst result obtained through CoCo is better than the worst result obtained through 
COIN). 
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2 The Mathematics of Collective Intelligence 


The full formalization of the COIN framework extends significantly beyond what is needed for 
this paper. 1 The restricted version needed here starts with an arbitrary vector space Z whose 
elements ( give the state of all the variables in the collective. 

We wish to search for the ( that maximizes the provided world utility G. In addition to G 
we are concerned with payoff utility functions {#77}, one such function for each variable/player 
77. We use the notation 77 to refer to all players other than 77. 

We will need to have a way to “standardize” utility functions so that the numeric value they 
assign to a C only reflects their ranking of ( relative to certain other elements of Z. We call such 
a standardization of utility U for player 77 the “intelligence for 77 at £ with respect to 17”. Here 
we will use intelligences that are equivalent to percentiles: 

eu(c.r,) = J d/x c ., (C)e[u( 0 - U(C )} , (1) 

where the subscript on the (normalized) measure indicates it is restricted to C sharing the same 
non-77 components as and where the Heaviside function 0 is defined to equal 1 when its 
argument is greater than or equal to 0, and to equal 0 otherwise. Note that an intelligence value 
is always between 0 and 1. 

Our uncertainty concerning the behavior of the system is reflected in a probability distribution 
over Z. Our ability to control the system consists of setting the value of some characteristic of 
the collective, e.g., setting the payoff functions of the players. Indicating that value by s, our 
analysis revolves around the following central equation for P(G | 5), which follows from Bayes' 
theorem: 

P(G | s) = J de G P(G | e G , s) j de g P(e G | s)P(e g \ s) , (2) 

where “e g v is the vector of the intelligences of the players with respect to their associated payoff 
functions, and “ec” is the vector of the intelligences of the players with respect to G . 

If we can choose s so that the third term in the integrand is peaked around vectors c g all of 
whose components are close to 1, then we have likely induced large (payoff function) intelligences. 
If we can also have the second term be peaked about eq equal to e g , then eq will also be large. 
Finally, if the first term in the integrand is peaked about high G when eg is large, then our choice 
of s will likely result in high G, as desired. 

Intuitively, it is in the third term that the requirement that payoff functions have high “signal- 
to-noise” (an issue not considered in conventional work in mechanism design) arises. It is in the 
second term that the requirement that the payoff functions be “aligned with G” arises. Previously 
developed COIN algorithms concentrated on these two terms. Finally, conventional function 
maximization techniques like simulated annealing instead are concerned with having term 1 have 
the desired form. 

It is the simultaneous concern for all three of these terms that underlies the CoCo algorithm. 
To present that algorithm we must first review some COIN results on how to simultaneously set 
terms 2 and 3 to have the desired form. 

The second term in Eq. 2 can be addressed by requiring that the collective be factored, 
which means that e 9 equals eg exactly for all (. In game-theory language, the Nash equilibria of 

*That framework encompasses, for example, arbitrary reassignments of how the various subsets of the variables 
comprising the collective are assigned to players. It also encompasses modification of the players’ information sets 
(i.e., modification of inter-player communication). See [22]. 
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a factored collective are critical points of G. In addition to this desirable equilibrium behavior, 
factored collectives incorporate appropriate off-equilibrium incentives to the players. As a trivial 
example, any “team game” in which all the payoff functions equal G is factored [8]. However 
team games often have very poor forms fcfr term 3 in Eq. 2, forms which get progressively worse 
as the size of the collective grows. This is because for such payoff functions player each rj will 
usually confront a very poor “signal-to-noise” ratio in trying to discern how its actions affect its 
payoff g n = G, since so many other players’ actions affect G and therefore affect g v . 

Previous COIN algorithms were based on varying the payoff functions {g^} to optimize the 
third term, subject to the requirement that the system be factored. To understand how those 
algorithms work, given a measure dfi(C n ), define the opacity at £ of utility U as: 


^t/(C : *U*) 


I dC'J(C I 0 


\U( 0 -U(Cy,M 


(3) 


where J is defined in terms of the underlying probability distributions. 2 

The denominator absolute value in the integrand in Eq. 3 reflects how sensitive U{Q is to 
changing Cr?- In contrast, the numerator absolute value reflects how sensitive U{Q is to changing 
C‘ V . So the smaller the opacity of a payoff function g v , the more g v (Q depends only on the 
move of player 77, i.e., the better the associated signal-to-noise ratio for 77. Intuitively then, lower 
opacity means it is easier for 77 to achieve a large value of its intelligence. 

To formalize this, choose a measure d/i defining opacity that is the same as the one defining 
intelligence. Then expected opacity bounds how close to 1 expected intelligence can be [22]: 


E(eu{( : 7]) \ s) < l - K, where 

K<E(ttu(C-^s)\s). (4) 

In practice the bounds in this result are usually tight. 

Next define a difference utility as one of the form: 

u(0 = <?«) - r(/(c)) (5) 

where T(f) is independent In general it is not possible for a collective both to be factored 
and to have zero opacity for all of its players. However any difference utility is factored [22]. In 
addition, under usually benign approximations, E(fl u \ s) is minimized over the set of difference 
utilities by choosing: 


r (/(0) = WIC**). 


(6) 


up to an overall additive constant. We call the resultant difference utility the Aristocrat utility 
(AU), loosely reflecting the fact that it measures the difference between a player’s actual action 
and the average action. 

If possible, we would like each player 77 to use the associated AU as its payoff function. This 
is not always feasible however. The problem is that to evaluate the expectation value defining 
its AU each player needs to evaluate the current probabilities of each of its potential actions. 
However if the player then changes its payoff function to be the associated AU it will in general 
substantially change its ensuing behavior those probabilities. (The player now wants to choose 


2 Writing it out in full, J«' ( <) = J(Gj> <' | C ip s )/^(Ct? I Cv s )> with: 

, _ p(c n 1 <v>») p K 1 , p K 1 ic;,*)^) 

»HCr?iC |C'Tp s /— 0 + 9 
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actions that maximize a different function from the one it was maximizing before.) In other 
words, it will change the probabilities of its actions, which means that its new payoff function 
is in fact not the AU for its actual probabilities. There are ways around this self-consistency 
problem, but in practice it is often easier to bypass the entire issue, by giving each rj a payoff 
function that does not depend on the probabilities of rfs own actions. 

One such payoff function is the Wonderful Life Utility (WLU). The WLU for player r? is 
parameterized by a pre-fixed clamping element CL^ chosen from among 77’s possible actions: 

WLU^GiQ-G^CLrt). (7) 

WLU is factored no matter what the choice of clamping element. Furthermore, while not match- 
ing the low opacity of AU, WLU usually has far better opacity than does a team game. 

In many circumstances one can meaningfully interpret a particular choice of clamping element 
for player 77 as equivalent to a “null” action for player 77 , equivalent to removing that player from 
the system. (Hence the name of this payoff function — cf. the Frank Capra movie.) For such a 
clamping element assigning the associated WLU to 77 as its payoff function is closely related to 
the economics technique of “endogenizing a player’s externalities” [9]. However it is usually the 
case that using WLU with a clamping element that is as close as possible to the expected action 
defining AU results in far lower opacity than does clamping to the null action. Accordingly, 
use of such an alternative WLU almost always results in far better values of G than does the 
“endogenizing” WLU. 

Typically, COINs in which the payoff functions are WLU or AU not only far outperform 
team games, but also conventional function maximization techniques like simulated annealing. 
However note that even if the payoff functions result in the collective’s having every component 
of the vector cq equal 1 — the best terms 2 and 3 can be — nothing in Eq. 2 precludes a 
poor value for (7(£). This is because having all those intelligences equal 1 only means that the 
collective is at a local maximum of G. 

This potential shortcoming is reflected in the first term in Eq. 2, a term that does not directly 
depend on the choice of the players’ payoff functions. Crudely speaking, what that term reflects 
is the propensity of the system to get stuck in a local maximum. Accordingly, one can use 
many of the conventional exploration/exploitation function maximization techniques like simu- 
lated annealing to induce a good form for that term. At each iteration, the exploration step is 
determined by the actions chosen by the players, rather than by using one of the more “blind” 
sampling schemes that are traditionally employed. The exploitation step though is the same as 
in the traditional formulation of the algorithm. In this way all three terms of Eq. 2 will have a 
desired form, and the induced G should be large. 

In its concern for all three terms this algorithm bears many similarities to well-run modern hu- 
man corporations, with G the “bottom line” of the entire corporation, the players 77 identified with 
the employees of the corporation, and the associated g v being the employees’ performance-based 
compensation packages. For example, for a “factored corporation”, each employee’s compen- 
sation package function contains incentives designed so that the better the employee performs 
their job, the better the bottom line of the corporation. In addition, if the compensation pack- 
ages are “low opacity” , the employees will have a relatively easy time discerning the relationship 
between their behavior and their compensation. Finally, the centralized exploitation process in 
C 0 C 0 is similar to the centralized decision-making of upper management that tries to determine 
whether to abandon or stick with a particular set of behaviors by the employees. It is due to 
these similarities that we call this algorithm the computational corporation algorithm. 
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3 Experiments 

The purpose of this section is to show both the wide applicability of CoCo, and to provide a 
comparative analysis showing its superiority over Simulated Annealing. For this purpose we 
chose two diverse domains of applications: 

• Minimum energy configurations for binary spin glasses (Theoretical physics) 

• Bin packing (Industrial engineering and resource allocation) 

In the experiments reported below, for simulated annealing, each agent (spin, item) had a 
25% probability of changing its actions (i.e., on the average, the new state differed by 25% from 
the previous state). The annealing schedule consisted of reducing the temperature (multiply by 
0.9) after a fixed number of time steps had elapsed (500 for spin glasses (on convergence runs 
reported in the text), 100 for bin packing). The players in the CoCo algorithm were handicapped 
by using perhaps the simplest possible reinforcement learning algorithm [6, 18, 24, 25]. The AU 
version of CoCo simply assumed that each agent r\ had a uniform probability distribution over 
its possible moves. Unless otherwise specified, the clamping elements in the COIN versions were 
set to 0 (vector of zeroes). 

3.1 Binary Spin Glasses — The 2-D Ising Model 

Spin glasses have traditionally been viewed as one of the pillars of statistical mechanics and 
the preferred comparative domain for analyzing the efficacy of various stochastic relaxation and 
optimization techniques [14]. There are many optimization methods developed precisely for 
solving the spin glass problem [14] and in this article we are not aiming to improve upon such 
specialized methods. Rather, we use this domain to compare two multi-purpose optimization 
algorithms, namely CoCo and simulated annealing. 

In this article, we restrict our attention to the 2-D Ising Model, i.e., a special spin glass where 
each site can occupy only one of two possible states - spin up or down in ferromagnetism, and 
empty or occupied in modeling liquid/gas phase transition. Because exact algorithmic solutions 
for the two dimensional Ising model have been already developed, 2-D binary spin glasses are 
considered a standard benchmarking tool [14]. Briefly, this problem consists of (a four connected) 
two dimensional grid with periodic boundary conditions. Each site s* can have one of two spins 
(here taken to be 1 or -1). The link between any two sites, , is an arbitrary value (here, 
without loss of generality restricted to between -1 and 1). The goal is to find the states of the 
spin glass such that the global energy defined by: 

y! yi Sj.sj dij 

i 3 

is minimized. It is easy to show that this problem has many local minima, and for any reasonable 
sized grid, examining all possible states quickly becomes an intractable problem [14] (2 n states 
for n sites). 

In modeling this binary grid, we mapped each site (* as an independently active agent, with 
its’ chosen action at time t represented by the binary choice Sk,t (i.e., the choice of spin up or 
spin down). Each agent selects its next action/state using the COIN/CoCo framework. 

Figure 1(a) shows the performance (averaged over 50 runs) of Simulated Annealing vs. AU 
CoCo and AU COIN for a 10x10 Ising model. As it can be seen, CoCo far outperforms Simulated 
Annealing. Comparing the exploration steps of CoCo and simulated annealing shows the advan- 
tage gained through CoCo. Notice that the gap between the exploration and exploitation steps 
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Figure 1: (a) Performance (left); (b) Scaling (right) for the spin glass problem. 


of CoCo are far narrower than that of simulated annealing, emphasizing the “guided” aspect of 
the search used by CoCo. 

Comparing the performance of SA to that of CoCo, one interesting question that arises is 
whether SA will reach the CoCo results given sufficient time, and if so, how much longer will it 
take to do so. On the 10x10 grid discussed above, the best simulated annealing schedule used 
required two orders of magnitude longer to convergence, and even then failed to reach the CoCo 
results 3 . As the scaling runs discussed below indicate, one expects this discrepancy to increase 
with the size of the problem. 

A final noteworthy result is the increased scalability obtained through CoCo. Figure 1(b) 
shows the ratio of the the global utility achieved using simulated annealing to the global utility 
achieved through CoCo as a function of the grid size. The exponential decline in performance 
clearly shows that the larger the size of the binary spin glass grid, the greater the impact of 
CoCo. 


3.2 Bin-Packing 

The second problem we investigate is the bin packing problem [7]. This problem consists of a list 
of items (ai,a 2 , ■ ■ ■ , a n ) and a supply of n bins, Bi ) with capacity C each. The size of each item 
is given by a function s(ai) which satisfies 0 < s(a*) < C,Vz. The problem consists of packing 
the items into a minimum number of bins, while ensuring that the contents of no bin exceeds C . 
More precisely, the problem consists of minimizing: 

n 

G — ^ Iq x > subject to: ^ s(a0 < C (8) 

i=l ieB t 

where Is \ is the “content-indicator” function for and is 0 if the bin is empty and 1 otherwise. 

This problem has many real world application ranging from loading trucks subject to weight 
limitations [7] to distributing jobs on a computer network where each processor has limited 
resources (e.g., memory, CPU cycles) [16]. The bin-packing problem is known to be NP- 
complete [10], and many approximate algorithms were developed to address it [3, 7]. In this 
section we study how the COIN framework can be applied to this domain. 

3 An optimistic extension of the SA performance under ideal annealing schedules projects SA reaching CoCo 
results 500 times more slowly than CoCo in this relatively small problem. 
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In these experiments the agents use a “soft” version of the global utility function given by: 


^soft ~ 


£r=i[(f) 2 -(*;-f) 2 ] 

a , (** - ¥) 2 ’ 


( 9 ) 


where x* == ]P. € ^ 5 ( G ») S ives *h e total size of all items in bin i. This function has two minima 
(at 0 and C) and provides two benefits: First, by discouraging “illegal” solutions due to the 
large penalty incurred by exceeding C in any one bin, it greatly reduces the need to verify that a 
solution satisfies the constraints after that solution is found. Second, it provides the system with 
a better “signal” and encourages bins to be closer to full or empty. All algorithms (including 
simulated annealing) use this soft G function, but they are all evaluated based on the provided 
global reward (Eq 8). 

In these experiments, all the algorithms had the same number of iterations (1000 in this 
case) and the results we report below averaged over 50 runs 4 . Note that COIN-based systems 
used the first 200 steps to generate their “learning” data, and thus took random actions during 
this interval. In simulated annealing, the proposal distribution was slowly modified to generate 
solutions that differed in fewer items than the current solution as the experiment progressed. The 
annealing schedule consisted of reducing the Boltzmann temperature at intervals of 100 steps. 


Table 1: Performance at t = 1000 


Algorithms 

Average 

Best 

Worst 

Reached Optimal 

CoCo WLU 

4.17 ± 0.05 

4 

5 

82 % 

CoCo TG 

14.28 ± 0.16 

12 

16 

0% 

COIN WLU 

4.46 ± 0.08 

4 

6 

56 % 

COIN TG 

15.76 ± 0.23 

13 

18 

0% 

Sim Anneal 

15.67 ± 0.18 

12 

18 

0% 


Table 1 summarizes the results of the various algorithms for the bin packing problem. The 
average performance of simulated annealing and team game COIN were statistically indistin- 
guishable. Neither fared well, with the worst solution in both cases being random. CoCo team 
game performed slightly better, but the real gains were not achieved until the WLU private util- 
ity function was used 5 . Though the CoCo WLU slightly outperformed the straight COIN WLU 
(lower average and higher percentage of finding the optimal solution) both of these algorithms 
were significantly superior to the other three. 

Although the results reported above show the superiority of the WLU-based algorithms, they 
do not fully reflect the advantages of COIN-based systems. One aspect of the algorithm perfor- 
mance that is of paramount importance in optimization problems is the speed of convergence. 
The two WLU-based algorithms both converged to near optimal solutions within the first 50 steps 
following the learning period. Even projecting the team game CoCo and simulated annealing 
performances linearly 6 they were two and three orders of magnitude slower, respectively, than 
WLU-based algorithms. 

4 The errors in the mean are reported as plus/minus in the table, and omitted in the graph because the resulting 
error bars are too small to see. 

5 For this problem, there was no difference (statistically) between the performance of WLU and AU. Therefore 
to streamline the comparative process, we report only WLU results. 

6 This favors TG and SA since in reality their convergence rate drops. 
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4 Conclusion 


There are three general types of parallel systems found in nature that can be viewed as engaging 
in maximization of a function G. These are exemplified by neo-Darwinian natural selection 
(for G that take any single one of the elements of the parallel system as an argument), spin 
glass relaxation (for G that take the entire system as argument), and clearning of markets in 
economics relaxation (for G that take the entire system as argument and in which the overall 
parallel system can be viewed as a non-cooperative game). All three types of system have been 
translated into computational algorithms, exemplified by genetic algorithms, simulated annealing, 
and computational markets, respectively. 

The Collective Intelligence framework can be viewed as an extension of conventional economics- 
based systems of the third type, to reflect signal-to-noise issues and greater freedom in modifying 
the individual players than exist in economies of human beings. It has traditionally been applied 
only to systems of the third type. Recent mathematical advances in that framework have shown 
that those traditional COIN algorithms only account for two of the three factors determining 
performance. The third factor can be accounted for by integrating the COIN with a technique 
of the second type, like simulated annealing. Intuitively, such an integrated system, which we 
call a computational corporation, can be viewed as conventional simulated annealing modified by 
having the value of each variable in the exploration step of the SA be set by a (computer-based) 
player in an associated non-cooperative game. Doing this allows the leveraging of the intelligence 
of such players to improve the exploration, and thereby improve the performance. 

We present experiments demonstrating that the computational corporation algorithm out- 
performs simulated annealing by several orders of magnitudes for spin glass relaxation and bin- 
packing. In the spin glass domain CoCo converges to a given value of G over two orders of 
magnitude faster than does SA, with far better scaling behavior (the ratio of their convergence 
speeds increased exponentially with the size of the problem). In the bin packing problem, both 
CoCo and conventional COIN algorithms significantly outperform SA (up to three orders of 
magnitude faster convergence) . 
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